Lossless data compression
نویسنده
چکیده
This thesis makes several contributions to the field of data compression. Lossless data compression algorithms shorten the description of input objects, such as sequences of text, in a way that allows perfect recovery of the original object. Such algorithms exploit the fact that input objects are not uniformly distributed: by allocating shorter descriptions to more probable objects and longer descriptions to less probable objects, the expected length of the compressed output can be made shorter than the object’s original description. Compression algorithms can be designed to match almost any given probability distribution over input objects. This thesis employs probabilistic modelling, Bayesian inference, and arithmetic coding to derive compression algorithms for a variety of applications, making the underlying probability distributions explicit throughout. A general compression toolbox is described, consisting of practical algorithms for compressing data distributed by various fundamental probability distributions, and mechanisms for combining these algorithms in a principled way. Building on the compression toolbox, new mathematical theory is introduced for compressing objects with an underlying combinatorial structure, such as permutations, combinations, and multisets. An example application is given that compresses unordered collections of strings, even if the strings in the collection are individually incompressible. For text compression, a novel unifying construction is developed for a family of contextsensitive compression algorithms. Special cases of this family include the PPM algorithm and the Sequence Memoizer, an unbounded depth hierarchical Pitman–Yor process model. It is shown how these algorithms are related, what their probabilistic models are, and how they produce fundamentally similar results. The work concludes with experimental results, example applications, and a brief discussion on cost-sensitive compression and adversarial sequences.
منابع مشابه
Lossless Microarray Image Compression by Hardware Array Compactor
Microarray technology is a new and powerful tool for concurrent monitoring of large number of genes expressions. Each microarray experiment produces hundreds of images. Each digital image requires a large storage space. Hence, real-time processing of these images and transmission of them necessitates efficient and custom-made lossless compression schemes. In this paper, we offer a new archi...
متن کاملData Compression for Network GIS
Data compression algorithms can be categorized into lossless and lossy. Bit streams generated by lossless compression algorithm can be faithfully recovered to the original data. If loss of one single bit may cause serious and unpredictable consequences in original data (for example, text and medical image compression) lossless compression algorithm should be applied. If data consumers can toler...
متن کاملComparison of Different Methods for Lossless Medical Image Compression
Here, the concept of compression theory and lossless image compression methods are studied. In order to preserve the value of diagnostic medical images, it is necessary to provide lossless image compression. Apart from practical reasons, there are often legal restrictions on the lossless medical image compression. Lossless data compression has been suggested for many space science exploration m...
متن کاملA Novel Real Time Algorithm for Remote Sensing Lossless Data Compression based on Enhanced DPCM
In this paper, simplicity of prediction models for image transformation is used to introduce a low complex and efficient lossless compression method for LiDAR rasterized data and RS grayscale images based on improving the energy compaction ability of prediction models. Further, proposed method is applied on some RS images and LiDAR test cases, and the results are evaluated and compared with oth...
متن کاملLossless and Lossy Data Compression
Data compression (or source coding) is the process of creating binary representations of data which require less storage space than the original data 7; 14; 15]. Lossless compression is used where perfect reproduction is required while lossy compression is used where perfect reproduction is not possible or requires too many bits. Achieving optimal compression with respect to resource constraint...
متن کاملA Comparison of Lossless Compression Methods for Palmprint Images
In this work, lossless grayscale image compression methods are compared on a public palmprint image database. The total number of test images was about 7752, with 8 bit rates. The performance using different lossless compression algorithms on the compression ratios when processing palmprint sample data is investigated, in particular, we relate the application of CALIC, JPEGLS,RAR, JPEG2000 (los...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015